REPLICATION MATERIALS

Paper: The Impact of Welfare on Intergroup Relations: Caste-Based Social Insurance and Social Integration in India
Author: Akshay Govind Dixit



TABLE OF CONTENTS

1. Scope of Replication
2. Software Dependencies
3. Stata Add-ons
4. Secondary Data Sources
5. Survey Data & Questionnaires
6. Folder Organization
7. Do Files
8. Output Files
9. Computations Reported in the Text



1. SCOPE OF REPLICATION

This README provides instructions for reproducing the results in the paper.

The replication package includes the original survey data, which allow reproduction of all tables, figures, and computations based on these data.

It does not include the secondary data sources used in the analysis.
Instructions on how to obtain these datasets are provided below. 

The replication package includes Stata code that generates all results reported in the main text and supplementary materials.
 
Of these, five .do files run directly on the included survey data.

The remaining scripts rely on secondary datasets and are therefore commented out in the master .do file.
To reproduce the full set of results, users would need to obtain the relevant secondary datasets. 
Once these datasets are placed in the designated folders, the commented-out scripts in the master .do file can be activated to regenerate the associated results. 

All tables, figures, and computations are reproducible from code and data, except the maps (Figures 4, 5, and S7).
These maps were made using ArcGIS Online.
Figures 5 and S7 rely on village-level coordinates from the 2001 Census of India. 
The coordinates were obtained from ML Infomaps Pvt. Ltd. and accessed via the Harvard University Library. 
This dataset is not publicly available. 
Interested users should contact ML Infomaps Pvt. Ltd. for licensing information.
 


2. SOFTWARE DEPENDENCIES: STATA, EXCEL & TEX

Stata is required to reproduce the analyses in the paper. 

The code was last modified in Stata 19.

The entire code takes roughly 8 minutes to run using StataMP, on a computer with 48GB memory and macOS Sequoia 15.7.
The code pertaining only to the primary survey data, excluding all secondary data, takes about 1.5 minutes to run.

On some systems, Stata may require permission to access files saved on Desktop or cloud-synced folders (e.g., Dropbox, iCloud, OneDrive). 
Please ensure Stata has permission, and that the replication package is synced locally, before running the code.

Microsoft Excel and a TeX viewer are needed to view the output tables, which are exported in either .xlsx, .xls or .tex format.



3. STATA ADD-ONS

Install the Stata commands listed below prior to running the code.

The ado folder includes documentation and version information on the Stata add-ons.

ssc install outreg2
ssc install coefplot
ssc install schemepack
ssc install distinct
ssc install egenmore
ssc install elabel
net install profileplot, from("https://stats.oarc.ucla.edu/stat/stata/ado/analysis")



4. SECONDARY DATA SOURCES

The secondary datasets used in the analysis can be obtained as follows:

India Human Development Survey II (IHDS) 2011-12: Data and documentation available at https://ihds.umd.edu/data/data-download.
Download the 2011-12 dataset. This will entail reviewing and agreeing to the terms of use.


Rural Economic and Demographic Survey (REDS) 1999: Data and documentation available at https://afosterri.org/arisreds_data/.
Download the data in the folder "public99/".


Consumer Pyramids Household Survey (CPHS): Requires a paid (individual or institutional) subscription. 
With a subscription, data and documentation available at https://consumerpyramidsdx.cmie.com.

CPHS data comes in individual zip files.
Download and unzip the following data files (corresponding to May 2017 - April 2019):
	i. Member income: from 20170531 to 20190430
	ii. Household income: from 20170531 to 20190430
	iii. Aspirational India: from 20170501_20170831 to 20190101_20190430
	iv. Consumption pyramids: from 20170531 to 20190430


2011 Census of India, District Census Handbook, Village Directory - Andhra Pradesh: Available at https://web.archive.org/web/20170711235634/http://www.censusindia.gov.in/2011census/dchb/DCHB_Village_Release_2800.xlsx.
Original URL no longer active; archived version cited. 


Socioeconomic High-resolution Rural-Urban Geographic Platform for India (SHRUG): Data and documentation available at https://www.devdatalab.org/shrug.
More details on this data are described in Asher, Lunt, Matsuura and Novosad (2021).


Note that secondary data are not included in the replication package because their terms of use either prohibit distribution (IHDS, CPHS), discourage it (SHRUG), or do not clarify if distribution is allowed (REDS, Census).
For this reason, and to ensure proper attribution, users should obtain the data from the respective sources.



5. SURVEY DATA & QUESTIONNAIRES

The survey data is stored in data/survey, and includes:

Household Survey V2.dta: Household-level data from 3,020 households.
The relevant survey instrument for this data is questionnaires/Household questionnaire.pdf.

Village Survey V2.dta: Village-level data collected from elected representatives across 75 villages.
The relevant survey instrument for this data is questionnaires/Village questionnaire.pdf.

sample_villages.csv: List of the sample villages where the survey was conducted.
This is stored in the folder "data".



6. FOLDER ORGANIZATION

The folder is organized as follows:

Files: 
	* README.txt: instructions
	* sample_villages.csv: List of survey sample villages


Folders:
	* ado
	* analysis
		- survey_main
		- survey_robustness
	* data
		- survey
	* do
	* questionnaires


data/survey contains survey data from 3,020 households and 75 villages.

Folders for secondary datasets (cphs, reds99, shrug, ihds) do not exist by default.
Users must obtain the relevant secondary datasets and place them in the designated folders as follows:

	* data
		- cphs
			aspirational_india_20170501_20170831_R.csv
			consumption_pyramids_20170531_MS_rev.csv
			household_income_20170531_MS_rev.csv
			member_income_20170531_MS_rev.csv
			[and so on]
		- ihds
			-- ICPSR_36151
				--- DS001
					36151-0001-Data.dta
				--- DS002
					36151-0002-Data.dta
				--- DS003
				[and so on)
		- reds99
			rd99001.dta
			rd99002.dta
			rd99003.dta
			rd99004.dta
			(and so on)
		- shrug
			-- dta_shrug-v1.5.samosa-pop-econ-census-dta
				--- shrug-v1.5.samosa-pop-econ-census-dta
					shrug_pc11.dta
			-- shrug-shrid-keys-dta
				shrid_loc_names.dta
			(and so on)
		- survey
			Household Survey V2.dta
			Village Survey V2.dta


Save the file from the 2011 Census of India, District Census Handbook (DH_2011_DCHB_Village_Release_2800.xlsx) in the main "data" folder.

Any intermediate datasets produced by the code (e.g., clean data) are stored along with the respective raw datasets.



6. DO FILES

All Stata code scripts are stored in the folder titled "do".

Set username and directory in 0_master.do. 

Each do file's name indicates the dataset that it works with.
 
By default, all .do files using secondary data (reds, ihds, cphs, census) are commented out.
The commented-out .do files can be activated once the associated secondary data has been downloaded.  

For the CPHS and survey data, run the .do files exactly in the order specified below.
This is because output from earlier scripts is required to run later ones.

With all data downloaded and placed in designated folders, 0_master.do can be run for a single-click replication.


The list of .do files, along with their function and output, is as follows:

1_reds analyzes the REDS 1999 data to:
	- Produce Table S7 (terms of loans by source) 
	- Compute statistics reported in the "Caste and Social Insurance" section.

2_ihds analyzes the IHDS 2011-12 data to produce:
	- Table S17 (Correlation between household wealth and practice of untouchability)
	- Table S46 (Comparing landed and landless households in Telangana and Andhra Pradesh)
	- Statistic on residential segregation reported in the "Caste and Social Insurance" section.

3_cphs_import imports CPHS .csv files into Stata.

4_cphs_merge merges CPHS datasets (member income, aspirational india, income pyramids, consumption pyramids).

5_cphs_clean starts with the merged data created by 4_cphs_merge and creates variables for household-level characteristics.

6_cphs_descriptive analyzes CPHS data to produce the following descriptive results: 
	- Table S8 (Proportion of households borrowing from relatives or friends)
	- Table S9 (Income and consumption within castes)
	- Descriptive stats on loans by within-caste income quintile, reported in the "Caste and Social Insurance" section.

7_cphs_regression analyzes CPHS data to produce the following results:
	- Figure 2 (Income from government transfers)
	- Figure 3 (Dynamic Treatment Effects: Effect of RBS on Borrowing from Relatives or Friends)
	- Table 1 (Effect of RBS on Borrowing from Relatives or Friends)

	- Figure S8 (Fraction borrowing from relatives or friends)
	- Table S1 (Effect of RBS on borrowings from relatives/friends for consumption)
	- Table S2 (Effect of RBS on borrowings from relatives/friends by caste category)
	- Table S14 (Effect of RBS on borrowings from sources other than relatives/friends)
	- Table S21 (Dynamic treatment effects: Effect of RBS on borrowing from relatives or friends)
	- Table S22 (Income from government transfers)
	- Table S45 (Effect of RBS on borrowings from relatives/friends (continuous treatment variable)) 
	- Table S47 (Effect of RBS on borrowing for consumption and investment).

8_cphs_bordering_districts analyzes CPHS data to produce:
	- Figure S9, Table S20 (Dynamic treatment effects: Bordering districts in neighboring states).

9_census_balance_check analyzes data from the 2011 census and 2013 economic survey to produce: 
	- Table S18 (comparison of villages along the Telangana/Andhra Pradesh border).

10_survey_village_clean cleans data from the village survey, and produces:
	- Figure 1 (Variation in Caste-Based Inequality in Land Ownership in the Survey Sample)
	- Table S12 (Comparison of lower and higher inequality villages)
	- Table S19 (Comparison of villages using 2023 survey data).

11_survey_household_clean cleans data from the household survey for analysis.

12_survey_descriptive produces the following results using the survey data:
	- Table 2 (Composition of Sample Households by State and Landownership Status)
	- Table 3 (Welfare Amount Received from All Programs in the Last 12 Months)
	- Figure S4 (Illustration of differences in land ownership by caste)
	- Table S11 (Comparison of households in lower and higher inequality villages)
	- Table S16 (Correlation between income and behavior towards outgroups)
	- Table S50 (Comparing landless households in Telangana and Andhra Pradesh)
	- Table S52 (Correlation between festival spending and sharing meals with other castes)
	- Descriptive stats reported in the section "The Impact of Welfare on Caste-based Social Integration."

13_survey_analysis produces the following results using the survey data:
	- Table 4 (Average Effect of RBS on Borrowing from Caste Members)
	- Table 5 (The Effect of RBS on Borrowing from Caste Members and Festival Spending, by Inequality in Land Ownership)
	- Figure 6 (The Effect of Welfare on Intercaste Ties)
	- Figure 7 (Inequality and the Effect of Welfare on Intercaste Ties)
	- Figure 8 (The Effect of Welfare on Attitudes Towards Other Castes)
	- Figure S5, Table S28 (Effect of welfare on preferred attributes of MLA candidates)
	- Table S13 (Effect on perceptions of inequality)

	It also produces more detailed versions of some of the above main results:
	- Table S51 (detailed version of Table 4)
	- Tables S10 and S15 (detailed version of Table 5)
	- Tables S3 and S4 (tabular versions of Figure 6)
	- Tables S23-S26 (tabular versions of Figure 7)
	- Tables S5 and S27 (tabular versions of Figure 8)

14_survey_robustness_checks produces the following supplementary results using survey data:
	- Figure S1, Tables S29-S32 (Controlling for caste)
	- Figure S2, Tables S33-S36 (Controlling for public goods provision)
	- Figure S3, Tables S41-S44 (Robustness check for definition of low inequality, using mean instead of median)
	- Figure S6, Tables S37-S40 (Controlling for respondent characteristics)
	- Figure S10, Tables S53-S56 (Robustness check for definition of low inequality, using a threshold of 1.5 instead of median)
	- Table S6 (Heterogeneity by travel distance by road to the other state)
	- Table S57 (Effect of welfare on incentivized donation among SC respondents)



7. OUTPUT FILES

The analysis folder is where the figures and tables produced by the do files are saved.

Except for a placeholder README file, this folder is empty until code is run and output is produced.

Output from 13_survey_analysis is stored in analysis/survey_main.

Output from 14_survey_robustness_checks is stored in analysis/survey_robustness.

A detailed mapping of output files to figure and table numbers is given below.


Output that appears in the main text, in order of appearance:

	- Figure 1: kdensity_land_to_pop.pdf
	- Table 1: rb_borrowings_impact_districtwave.tex
	- Figure 2: rb_transfers_trend.pdf
	- Figure 3: rb_borrowings_trend.pdf
	- Figure 4: [map made using ArcGIS, no Stata output] 
	- Figure 5: [map made using ArcGIS, no Stata output]
	- Table 2: [tabulated in line 31 of 12_survey_descriptive.do]
	- Table 3: welfare_benefits.xlsx
	- Table 4: survey_main/borrowed_caste.tex
	- Table 5: survey_main/borrowed_caste.tex AND survey_main/festival_spending.tex
	- Figure 6: survey_main/intercaste_behavior.pdf
	- Figure 7: survey_main/het_inequality_intercaste_behavior.pdf
	- Figure 8: survey_main/het_inequality_intercaste_attitudes.pdf


Output that appears in the supplementary materials:

	- Figure S1: survey_robustness/control for caste.png
	- Figure S2: survey_robustness/control for public goods.png
	- Figure S3: survey_robustness/het_inequality_intercaste_behavior_3.png
	- Figure S4: box_plot_major_caste.png
	- Figure S5: survey_main/candidate_preference.png
	- Figure S6: survey_robustness/control for respondent characteristics.png
	- Figure S7: [map made using ArcGIS, no Stata output]
	- Figure S8: borrowings_raw_trend.png
	- Figure S9: borrowings_trend_borderingdistricts.png
	- Figure S10: survey_robustness/het_inequality_intercaste_behavior_1_5.png

	- Table S1: rb_borrowings_impact_districtwave.tex
	- Table S2: rb_het_caste.tex
	- Table S3: survey_main/donation.tex
	- Table S4: survey_main/intercaste_behavior.tex
	- Table S5: survey_main/inequality_no_intercaste_attitudes.tex
	- Table S6: survey_robustness/het effects by distance to border.tex
	- Table S7: reds_terms_of_loan.xlsx
	- Table S8: caste_borrowings_purpose.xlsx
	- Table S9: redistribution_within_castes.xlsx
	- Table S10: survey_main/borrowed_caste.tex

	- Table S11: balance_check_inequality_hhsurvey_AP.xlsx
	- Table S12: balance_check_inequality.xlsx
	- Table S13: survey_main/inequality_increased.tex
	- Table S14: rb_borrowings_impact_bysource.tex
	- Table S15: survey_main/festival_spending.tex
	- Table S16: exploring_income_effect.tex
	- Table S17: ihds_exploring_income_effect.tex
	- Table S18: balance_check.xlsx
	- Table S19: balance_check_survey.xlsx
	- Table S20: borrowings_trend_borderingdistricts.tex

	- Table S21: rb_borrowings_trend.tex
	- Table S22: rb_transfers_trend.tex
	- Table S23: survey_main/inequality_no_donation.tex
	- Table S24: survey_main/inequality_no_intercaste_behavior.tex
	- Table S25: survey_main/inequality_yes_donation.tex
	- Table S26: survey_main/inequality_yes_intercaste_behavior.tex
	- Table S27: survey_main/inequality_yes_intercaste_attitudes.tex
	- Table S28: candidate_preference.tex
	- Table S29: survey_robustness/control for caste_donation_low.tex
	- Table S30: survey_robustness/control for caste_behavior_low.tex

	- Table S31: survey_robustness/control for caste_donation_high.tex
	- Table S32: survey_robustness/control for caste_behavior_high.tex
	- Table S33: survey_robustness/control for public_donation_low.tex
	- Table S34: survey_robustness/control for public_behavior_low.tex
	- Table S35: survey_robustness/control for public_donation_high.tex
	- Table S36: survey_robustness/control for public_behavior_high.tex
	- Table S37: survey_robustness/control for respondent_donation_low.tex
	- Table S38: survey_robustness/control for respondent_behavior_low.tex
	- Table S39: survey_robustness/control for respondent_donation_high.tex
	- Table S40: survey_robustness/control for respondent_behavior_high.tex

	- Table S41: survey_robustness/het_inequality_3_no_donation.tex
	- Table S42: survey_robustness/het_inequality_3_no_intercaste_behavior.tex
	- Table S43: survey_robustness/het_inequality_3_yes_donation.tex
	- Table S44: survey_robustness/het_inequality_3_yes_intercaste_behavior.tex
	- Table S45: rb_borrowings_intensity.tex
	- Table S46: ihds_untouchability.tex
	- Table S47: rb_borrowings_impact_bypurpose.tex
	- Table S48: [no Stata output]
	- Table S49: [no Stata output]
	- Table S50: balance_check_landless.xlsx

	- Table S51: survey_main/borrowed_caste.tex
	- Table S52: share_meal_festival.tex
	- Table S53: survey_robustness/het_inequality_1_5_no_donation.tex
	- Table S54: survey_robustness/het_inequality_1_5_no_intercaste_behavior.tex
	- Table S55: survey_robustness/het_inequality_1_5_yes_donation.tex
	- Table S56: survey_robustness/het_inequality_1_5_yes_intercaste_behavior.tex
	- Table S57: survey_robustness/donation_sc.tex


 
9. COMPUTATIONS REPORTED IN THE TEXT

The .do files also compute statistics that are reported in the text or footnotes, outside of tables or figures.

These statistics are listed below along with the relevant code locations.
	
	- "Assuming that all loans from friends are within-caste, borrowing from relatives and friends 
	would account for 84% of all borrowing from relatives, friends, and caste members. If just 
	one-third of borrowing from friends is within-caste, relatives and friends together would 
	still account for 64%."
	Based on numbers in reds_terms_of_loan.xlsx.
	1_reds.do, lines 120-146 discuss the computation.


	- "Although not explicitly indicated in the REDS data, some loans from other non-institutional 
	sources, such as employers or landlords, may also be within-caste. Even under the extreme assumption 
	that all such loans are within-caste, borrowing from relatives and friends would account for a 
	majority of borrowing from relatives, friends, caste members, and other non-institutional 
	sources--52% if only a third of borrowing from friends is within-caste, and 69% if all borrowing
	from friends is within-caste."
	Based on numbers in reds_terms_of_loan.xlsx.
	1_reds.do, lines 120-146 discuss the computation.

	
	- "In terms of volume, loans from relatives comprised over 13% of total household debt from all 
	sources recorded int he REDS. Loans from friends accounted for just under 10%, and loans from 
	other caste members about 3%."
	1_reds.do, line 304.

	- "The 1999 REDS interviewed 7,474 households from 253 villages across 16 states in India."
	1_reds.do, lines 24-26.


	- "Households that borrow from friends, relatives or caste members are also more likely to 
	receive gifts."
	1_reds.do, line 287.


	- "It does, however, provide a panel with a larger sample, including nearly 6,000 households 
	from Telangana alone."
	Based on sample size reported in Table S8.


	- "Similar fractions (22%) of the top and bottom quintiles reported having borrowed from 
	relatives and friends for consumption. Households in quintiles two, three, and four were 
	somewhat more likely to report such borrowing (25%, 27% and 30%, respectively)."
	6_cphs_descriptive.do, line 140.

	
	- "Based on my 2023 household survey data, the median annual benefit from RBS was 15% 
	of the median annual income reported."
	12_survey_descriptive.do, lines 58-59.

	
	- "Among landowners in my sample, the median landholding is 2 acres, with 80% smaller 
	than 5 acres and just 3% larger than 10 acres."
	12_survey_descriptive.do, lines 89-93.


	- "The ratio is as low as 0.67 in a village in which a caste constitutes 30% of the 
	population while owning 20% of the land."
	10_survey_village_clean.do, lines 184 and 190.


	- "In the most extreme case, this land-to-population ratio is 12.5."
	10_survey_village_clean.do, line 184.

	
	- "My analysis includes 2,016 CPHS households engaged in agricultural activities."
	7_cphs_regression.do, line 37.


	- "The sample includes 1,624 in the treated group and 392 in the comparison group."
	7_cphs_regression.do, line 37.


	- "This includes 41 villages in Telangana and 34 in Andhra Pradesh."
	10_survey_village_clean.do, line 50.	

	
	- "SCs constitute 23-24% of the population in sample villages on both sides,"
	Reported in Table S18, Row (1).


	- "In my survey, when asked to split 10 ``tokens of trust'' between a family member 
	and a random person of their state, landed respondents in Telangana assigned 3.3 tokens 
	to a random person of their state, on average, and landless respondents assigned 3.5---
	comparable to 3.2 and 3.4, respectively, in Andhra Pradesh."
	12_survey_descriptive.do, lines 115-135.


	- "In the survey, 82% respondents reported that they had lived in the same village 
	their whole life. Only nine reported moving to the village where they were surveyed in 
	the last six years (since RBS was launched), of which four were in Andhra Pradesh."
	12_survey_descriptive.do, lines 68-69.

	
	- "The 3,020 households in the sample span 41 caste groups." 
	12_survey_descriptive.do, line 140.

	- "Scheduled Castes (SCs) constitute 34% of the sample, 40% of whom own land. 
	Backward Castes (BC) are 56% of the sample, 55% of whom own land. Other Castes 
	(OC) account for 8%, 55% of whom own land."
	12_survey_descriptive.do, lines 79 and 104.


	- "By this definition, there are 33 low inequality and 41 high inequality villages in the sample, 
	with one missing value."
	10_survey_village_clean.do, line 71.


	- "Of the 41 sample villages with relatively high caste-based inequality, Reddys owned a plurality 
	of the land in 29."
	10_survey_village_clean.do, line 76.


	- "Figure 7 shows that in these villages, welfare increased the amount that non-SCs were willing 
	to donate to an NGO working to educate children from marginalized castes by 19%."
	13_survey_analysis.do, lines 172-188.


	- "Further, welfare reduced the likelihood of respondents reporting that most or all of their 
	friends were of the same caste by 33%."
	13_survey_analysis.do, lines 204-225.


	- "RBS funds are substantial, equivalent to 15% of annual income for the median household."
	12_survey_descriptive.do, lines 58-59.
	

